First, let’s load the packages we’ll sue
require(rjags)
require(Rgraphviz)
require(data.table)
Create some simulated data:
set.seed(432104)
OK. So let’s define a simple model is the basis for our simulated data and our probabilistic model of inference. First we’ll assume that there is in fact a latent ability attribute, \(\theta_i\), for each student \(i \in {1,\dots,.n}\) in our sample. Further, we’ll assume that this latent trait is accurately measured by an unobserved RIT score, which we’ll denote \(r_{i}\). In other words, RIT scores are a direct measure of \(\theta_i\). Luckily for us, students take a test that gives us an estimate of RIT scores, the MAP. So we assume that the observed RIT, \(\hat{r}\) is a realization from a distribution centered on \(r\) Finally, we assume that all other academic measures (interim assessments, Khan objectives, individual test items, even Do Nows/Exit Tickets). So we have observed realization of a RIT score for a student as well as all the other academic indicators. All other RVs and parameters need to be estimated. And our goal here is to make inferences about \(r_i\) given our observation of other academic indicators.
Here’s a graphical representation of our probability model:
rit.graph <- new("graphNEL",
nodes=c("theta[i]",
"r[i]",
"rhat[i]",
"a[i]",
"k[i]",
"t[i]",
"ahat[i]",
"khat[i]",
"that[i]"
),
edgemode="directed")
rit.graph <- addEdge("theta[i]", "r[i]", rit.graph, 1)
rit.graph <- addEdge("r[i]", "rhat[i]" , rit.graph, 1)
rit.graph <- addEdge("rhat[i]", "a[i]" , rit.graph, 1)
rit.graph <- addEdge("rhat[i]", "k[i]" , rit.graph, 1)
rit.graph <- addEdge("rhat[i]", "t[i]" , rit.graph, 1)
rit.graph <- addEdge("a[i]", "ahat[i]" , rit.graph, 1)
rit.graph <- addEdge("k[i]", "khat[i]" , rit.graph, 1)
rit.graph <- addEdge("t[i]", "that[i]" , rit.graph, 1)
plot(rit.graph)
Now, let’s simulate some data based on this basic probability model. The \(\theta_i\)’s need a prior and I’ll use the school level norms (because I’m on a plane and don’t have the student level norms with me; we can change these later) for fall 5th grade math. Thinking about this, we can collapse the \(\theta_i\)s and \(r_i\), since we are assuming theta exists on the the RIT scale. I’ll use \(\theta_i\) to denote the RIT-scaled latent variable.
set.seed(432104)
n <- 90 # number of students
r<-rnorm(n = n, mean = 211.39, sd =5.83)
The \(\theta_i\)s are then noisily measured by the MAP test. I’ll assume a standard deviation of about 3, but that is completely arbitrary.
rhat<-rnorm(n, r, sd=rep(3, times=n))
head(rhat)
## [1] 211.9 226.3 195.8 217.3 201.2 227.7
OK. let’s make some assumptions about how \(\hat{r}\) is related to each of the three realizations of our academic indicators. Let’s start with some arbitrary assessment that is graded on a zero to one scale (or zero to 100%) a realization of which is denoted \(\hat{a}_i\). We’ll keep it simple and assume that the \(\hat{r}_i\) are linear predictors of \(z.a_{i}\) in the inverse-logistic function. That is,
\[ \begin{aligned} z.a_{i} & = \alpha + \beta \cdot \hat{r}_i \\ \hat{a}_{i} & = \frac{1}{1+\exp^{-z.a{i}}} \end{aligned} \]
Assuming \(\alpha = -1\) and \(\beta = 0.01\), we simulating the data like so:
alpha<- -1
beta<-.01
z.a<- alpha + beta*rhat
head(z.a)
## [1] 1.119 1.263 0.958 1.173 1.012 1.277
a<-1/(1+exp(-z.a))
head(a)
## [1] 0.7538 0.7796 0.7227 0.7637 0.7333 0.7820
ahat <- rnorm(n=n, mean=a,sd=.025)
head(ahat)
## [1] 0.7420 0.7609 0.7291 0.7966 0.6997 0.7730
Let’s pretend, for now, that this the only non-MAP assessment that we have. Consequently, all we have observed are the \(\hat{a}_i\) and the current RIT scores \(\hat{r}_i\). Here’s the simplified model in JAGS/BUGS syntax:
model.jags<-"
model {
for(i in 1:n) {
r[i] ~ dnorm(211.39, 1/5.39)
rhat[i] ~ dnorm(r[i],tau)
z[i] <- alpha + beta*rhat[i]
a[i] <- 1/(1 + exp(-z[i]))
ahat[i] ~ dnorm(a[i], tau.a)
}
# priors
beta ~ dnorm(0,.0001)
alpha ~ dnorm(0, .0001)
tau ~ dgamma(1e-3,pow(15,2)*1e-3)
tau.a ~ dgamma(1e-3,pow(15,2)*1e-3)
sigma.a <- pow(1/tau,1/2)
}
"
model.spec<-textConnection(model.jags)
Now run the mode and take a look at \(\alpha\) and \(\beta\) and see that they are relative the estimated values are relatively close to the acutal values (-1 and 0.01, respectivly):
fit.jags <- jags.model(model.spec,
data = list('ahat' = ahat,
'rhat' = rhat,
'n' = n),
n.chains=4,
n.adapt=100)
## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph Size: 829
##
## Initializing model
samps.jags<-jags.samples(fit.jags,
c('beta', 'alpha'),
10000)
samps.jags
## $alpha
## mcarray:
## [1] -1.314
##
## Marginalizing over: iteration(10000),chain(4)
##
## $beta
## mcarray:
## [1] 2.094
##
## Marginalizing over: iteration(10000),chain(4)
No let’s look at the estimated \(a_i\)’s and \(r_i\)’s (the latter being our variable of interest)
samps.coda <- coda.samples(fit.jags,
c('a', 'r', 'beta', 'alpha'),
10000)
summary.coda<-summary(samps.coda)
plot(samps.coda)
Let’s extract the estimated \(r_i\)’s from the coda samples
r.estimated<-as.vector(summary.coda$statistics[93:nrow(summary.coda$statistics),"Mean"])
rs<-data.table(Actual=r, Estimated=r.estimated)
rs[,Residual:=Estimated-Actual]
## Actual Estimated Residual
## 1: 210.5 211.4 0.9496
## 2: 224.7 212.6 -12.0569
## 3: 201.0 210.1 9.0604
## 4: 215.7 211.9 -3.7798
## 5: 203.4 210.6 7.1648
## 6: 229.2 212.7 -16.4871
## 7: 222.6 212.7 -9.9147
## 8: 221.7 212.1 -9.6731
## 9: 222.5 212.2 -10.2661
## 10: 202.0 210.4 8.4643
## 11: 193.5 210.0 16.4232
## 12: 213.0 211.5 -1.4813
## 13: 214.2 211.8 -2.4588
## 14: 213.3 211.3 -1.9059
## 15: 213.1 211.4 -1.6779
## 16: 206.4 211.2 4.7066
## 17: 229.3 213.2 -16.0788
## 18: 209.6 211.0 1.4347
## 19: 216.7 211.9 -4.7212
## 20: 204.7 211.2 6.4510
## 21: 209.2 211.4 2.2195
## 22: 208.2 211.1 2.8945
## 23: 212.6 211.5 -1.0365
## 24: 209.5 211.2 1.7048
## 25: 209.8 211.1 1.2732
## 26: 207.6 211.0 3.4076
## 27: 215.0 212.2 -2.8091
## 28: 208.1 211.0 2.8728
## 29: 208.1 210.8 2.7022
## 30: 208.0 211.1 3.0776
## 31: 207.2 210.9 3.7622
## 32: 215.4 211.8 -3.6138
## 33: 213.8 211.4 -2.3533
## 34: 215.1 212.0 -3.1443
## 35: 215.3 212.4 -2.8533
## 36: 223.7 212.1 -11.5859
## 37: 211.8 211.5 -0.2948
## 38: 212.0 211.7 -0.2107
## 39: 218.1 211.7 -6.3841
## 40: 224.3 212.3 -11.9833
## 41: 211.3 210.9 -0.4535
## 42: 216.6 212.1 -4.5516
## 43: 210.6 211.3 0.7251
## 44: 204.2 210.5 6.2908
## 45: 205.0 210.8 5.7989
## 46: 200.4 210.4 10.0096
## 47: 209.8 211.6 1.7208
## 48: 213.0 211.7 -1.3344
## 49: 218.3 211.9 -6.3385
## 50: 207.9 210.7 2.8628
## 51: 215.0 211.3 -3.6759
## 52: 217.8 212.0 -5.7658
## 53: 211.7 211.6 -0.1288
## 54: 203.4 210.5 7.1741
## 55: 213.2 211.2 -2.0599
## 56: 215.3 211.7 -3.5374
## 57: 215.0 211.4 -3.6391
## 58: 210.6 211.3 0.7337
## 59: 204.3 211.0 6.6569
## 60: 199.7 210.4 10.6998
## 61: 216.5 212.0 -4.4888
## 62: 213.5 211.5 -2.0445
## 63: 216.8 211.9 -4.9294
## 64: 207.0 211.1 4.1347
## 65: 198.1 210.1 11.9914
## 66: 218.8 212.3 -6.4822
## 67: 214.9 212.1 -2.7865
## 68: 209.8 211.9 2.0764
## 69: 214.5 212.1 -2.3663
## 70: 213.6 211.5 -2.0747
## 71: 202.1 210.2 8.0874
## 72: 215.9 211.8 -4.0497
## 73: 206.5 210.7 4.2525
## 74: 215.4 212.0 -3.3728
## 75: 212.4 211.5 -0.8916
## 76: 224.3 212.4 -11.9353
## 77: 211.4 211.3 -0.1775
## 78: 216.2 211.8 -4.4212
## 79: 226.6 212.7 -13.8986
## 80: 215.2 211.6 -3.6235
## 81: 206.2 210.8 4.5729
## 82: 215.6 211.9 -3.6398
## 83: 221.2 212.4 -8.8432
## 84: 215.6 212.0 -3.6263
## 85: 210.8 211.5 0.7482
## 86: 205.4 210.7 5.3160
## 87: 211.9 211.4 -0.4862
## 88: 207.1 211.8 4.6601
## 89: 207.9 210.9 2.9717
## 90: 210.0 211.7 1.6955
## Actual Estimated Residual
rs
## Actual Estimated Residual
## 1: 210.5 211.4 0.9496
## 2: 224.7 212.6 -12.0569
## 3: 201.0 210.1 9.0604
## 4: 215.7 211.9 -3.7798
## 5: 203.4 210.6 7.1648
## 6: 229.2 212.7 -16.4871
## 7: 222.6 212.7 -9.9147
## 8: 221.7 212.1 -9.6731
## 9: 222.5 212.2 -10.2661
## 10: 202.0 210.4 8.4643
## 11: 193.5 210.0 16.4232
## 12: 213.0 211.5 -1.4813
## 13: 214.2 211.8 -2.4588
## 14: 213.3 211.3 -1.9059
## 15: 213.1 211.4 -1.6779
## 16: 206.4 211.2 4.7066
## 17: 229.3 213.2 -16.0788
## 18: 209.6 211.0 1.4347
## 19: 216.7 211.9 -4.7212
## 20: 204.7 211.2 6.4510
## 21: 209.2 211.4 2.2195
## 22: 208.2 211.1 2.8945
## 23: 212.6 211.5 -1.0365
## 24: 209.5 211.2 1.7048
## 25: 209.8 211.1 1.2732
## 26: 207.6 211.0 3.4076
## 27: 215.0 212.2 -2.8091
## 28: 208.1 211.0 2.8728
## 29: 208.1 210.8 2.7022
## 30: 208.0 211.1 3.0776
## 31: 207.2 210.9 3.7622
## 32: 215.4 211.8 -3.6138
## 33: 213.8 211.4 -2.3533
## 34: 215.1 212.0 -3.1443
## 35: 215.3 212.4 -2.8533
## 36: 223.7 212.1 -11.5859
## 37: 211.8 211.5 -0.2948
## 38: 212.0 211.7 -0.2107
## 39: 218.1 211.7 -6.3841
## 40: 224.3 212.3 -11.9833
## 41: 211.3 210.9 -0.4535
## 42: 216.6 212.1 -4.5516
## 43: 210.6 211.3 0.7251
## 44: 204.2 210.5 6.2908
## 45: 205.0 210.8 5.7989
## 46: 200.4 210.4 10.0096
## 47: 209.8 211.6 1.7208
## 48: 213.0 211.7 -1.3344
## 49: 218.3 211.9 -6.3385
## 50: 207.9 210.7 2.8628
## 51: 215.0 211.3 -3.6759
## 52: 217.8 212.0 -5.7658
## 53: 211.7 211.6 -0.1288
## 54: 203.4 210.5 7.1741
## 55: 213.2 211.2 -2.0599
## 56: 215.3 211.7 -3.5374
## 57: 215.0 211.4 -3.6391
## 58: 210.6 211.3 0.7337
## 59: 204.3 211.0 6.6569
## 60: 199.7 210.4 10.6998
## 61: 216.5 212.0 -4.4888
## 62: 213.5 211.5 -2.0445
## 63: 216.8 211.9 -4.9294
## 64: 207.0 211.1 4.1347
## 65: 198.1 210.1 11.9914
## 66: 218.8 212.3 -6.4822
## 67: 214.9 212.1 -2.7865
## 68: 209.8 211.9 2.0764
## 69: 214.5 212.1 -2.3663
## 70: 213.6 211.5 -2.0747
## 71: 202.1 210.2 8.0874
## 72: 215.9 211.8 -4.0497
## 73: 206.5 210.7 4.2525
## 74: 215.4 212.0 -3.3728
## 75: 212.4 211.5 -0.8916
## 76: 224.3 212.4 -11.9353
## 77: 211.4 211.3 -0.1775
## 78: 216.2 211.8 -4.4212
## 79: 226.6 212.7 -13.8986
## 80: 215.2 211.6 -3.6235
## 81: 206.2 210.8 4.5729
## 82: 215.6 211.9 -3.6398
## 83: 221.2 212.4 -8.8432
## 84: 215.6 212.0 -3.6263
## 85: 210.8 211.5 0.7482
## 86: 205.4 210.7 5.3160
## 87: 211.9 211.4 -0.4862
## 88: 207.1 211.8 4.6601
## 89: 207.9 210.9 2.9717
## 90: 210.0 211.7 1.6955
## Actual Estimated Residual
Nicley, the mean residual is -0.7849 and the mean squared residual is 38.6237. Not too shabby!